Expanding Chinese Sentiment Dictionaries from Large Scale Unlabeled Corpus

نویسندگان

  • Hongzhi Xu
  • Kai Zhao
  • Likun Qiu
  • Changjian Hu
چکیده

Unsupervised sentiment classification usually needs a user defined sentiment dictionary. However, the existing dictionaries in Chinese are insufficient, for example, the intersection rate of two popular Chinese sentiment dictionaries HowNet and NTUSD is less than 10%. In this paper, we present a method to help expand the dictionaries with more sentiment words by ranking them through link analysis based on a word graph constructed from a large unlabeled corpus. Meanwhile, our method could compute a sentiment polarity strength for each word in the new dictionaries. Manual evaluation has shown that our method has a high precision to expand the dictionaries. Experiments for sentiment classification have shown that the new dictionaries with the polarity strength for each word given by our algorithm are effective to improve the performance. As a byproduct, our algorithm could also discover the errors existing in current dictionaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CT-SPA: Text sentiment polarity prediction model using semi-automatically expanded sentiment lexicon

In this study, an automatic classification method based on the sentiment polarity of text is proposed. This method uses two sentiment dictionaries from different sources: the Chinese sentiment dictionary CSWN that integrates Chinese WordNet with SentiWordNet, and the sentiment dictionary obtained from a training corpus labeled with sentiment polarities. In this study, the sentiment polarity of ...

متن کامل

Co-Training for Cross-Lingual Sentiment Classification

The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine tr...

متن کامل

Generate Adjective Sentiment Dictionary for Social Media Sentiment Analysis Using Constrained Nonnegative Matrix Factorization

Although sentiment analysis has attracted a lot of research, little work has been done on social media data compared to product and movie reviews. This is due to the low accuracy that results from the more informal writing seen in social media data. Currently, most of sentiment analysis tools on social media choose the lexicon-based approach instead of the machine learning approach because the ...

متن کامل

Build Chinese Emotion Lexicons Using A Graph-based Algorithm and Multiple Resources

For sentiment analysis, lexicons play an important role in many related tasks. In this paper, aiming to build Chinese emotion lexicons for public use, we adopted a graph-based algorithm which ranks words according to a few seed emotion words. The ranking algorithm exploits the similarity between words, and uses multiple similarity metrics which can be derived from dictionaries, unlabeled corpor...

متن کامل

Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews

The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only available English resources for Chinese sentiment classification. We first investigate several basic meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010